VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization
نویسندگان
چکیده
Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs. However, trade-off between quantization bitwidth and final accuracy is complex non-convex, which makes it difficult optimized directly. Minimizing direct loss (DQL) coefficient data local optimization method, but previous works often neglect accurate control DQL, resulting in a higher DNN model accuracy. In this paper, we propose novel metric, called Vector Loss. Using new decompose minimization DQL two independent processes, significantly outperform traditional iterative L2 process terms effectiveness, as well We also develop solution VecQ, provides minimal achieve order speed up proposed during training, accelerate with parameterized probability estimation template-based derivation calculation. evaluate our algorithm on MNIST, CIFAR, ImageNet, IMDB movie review THUCNews text sets numerical models. The results demonstrate that more than state-of-the-art approaches yet flexible support. Moreover, evaluation quantized models Salient Object Detection (SOD) tasks maintains comparable feature extraction quality 16× weight size reduction.
منابع مشابه
Quantization-based language model compression
This paper describes two techniques for reducing the size of statistical back-off -gram language models in computer memory. Language model compression is achieved through a combination of quantizing language model probabilities and back-off weights and the pruning of parameters that are determined to be unnecessary after quantization. The recognition performance of the original and compressed l...
متن کاملLoss-aware Weight Quantization of Deep Networks
The huge size of deep networks hinders their use in small computing devices. In this paper, we consider compressing the network by weight quantization. We extend a recently proposed loss-aware weight binarization scheme to ternarization, with possibly different scaling parameters for the positive and negative weights, and m-bit (where m > 2) quantization. Experiments on feedforward and recurren...
متن کاملModel compression via distillation and quantization
Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning. One aspect of the field receiving considerable attention is efficiently executing deep models in resource-constrained environments, such as mobile or embedded devices. This paper focuses on this problem, and proposes two new compression methods, wh...
متن کاملMedical Image Compression Using Vector Quantization and Gaussian Mixture Model
Codebook design for vector quantization could be performed using clustering technique. The Gaussian Mixture Modeling (GMM) clustering algorithm involves modeling a statistical distribution by a mixture (or weighted sum) of other distributions. GMM has proven superior efficiency in both time and accuracy and has been used with vector quantization in some applications. This paper introduces a med...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Computers
سال: 2021
ISSN: ['1557-9956', '2326-3814', '0018-9340']
DOI: https://doi.org/10.1109/tc.2020.2995593